multi-omic data integration
BayReL: Bayesian Relational Learning for Multi-omics Data Integration
High-throughput molecular profiling technologies have produced high-dimensional multi-omics data, enabling systematic understanding of living systems at the genome scale. Studying molecular interactions across different data types helps reveal signal transduction mechanisms across different classes of molecules. In this paper, we develop a novel Bayesian representation learning method that infers the relational interactions across multi-omics data types. Our method, Bayesian Relational Learning (BayReL) for multi-omics data integration, takes advantage of a priori known relationships among the same class of molecules, modeled as a graph at each corresponding view, to learn view-specific latent variables as well as a multi-partite graph that encodes the interactions across views. Our experiments on several real-world datasets demonstrate enhanced performance of BayReL in inferring meaningful interactions compared to existing baselines.
Bay ReL: Bayesian Relational Learning for Multi-omics Data Integration: Supplementary Materials
To further clarify the model and workflow of our proposed BayReL, we provide a schematic illustration of BayReL in Figure S1, where we only include two views for clarity. Figure S2 shows the inferred bipartite network with the top 200 interactions by BayReL. Schematic illustration of BayReL. 2 Figure S2: The bipartite sub-network with the top 200 interactions inferred by BayReL in AML data, Genes and drugs are shown as blue and red nodes, respectively. D. Details on the experimental setups, hyper-parameter selection, and run time We learn the model for 1000 training epochs and use the validation set for early stopping. Each training epoch for CF, BRCA, and AML took 0.01, 0.42, In all experiments, we used CCAGFA R package as the official implementation of BCCA.
Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration
Summary and Contributions: In this paper, the authors propose a Bayesian representation learning framework that can infer links between heterogeneous graphs generated from multi-omics datasets. The main idea is to use the underlying relationship information within each dataset (or view) by modeling it as a graph. The method has 4 steps - (1) to embed the nodes of each view-specific graph into in the same latent space (2) generate a multi-view adjacency tensor using the similarity scores for node embeddings across views (3) Infer prior latent variables from the node embeddings and multi-view graphs and posterior from the view-specific data (4) Finally, perform variational inference to optimize model parameters and variational parameters. The paper attempts to solve an important problem of multi-omics data integration by learning relationships that can exist between different modalities by modeling them as multi-view link prediction. This work could be useful to the broader ML community.
Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration
The paper proposes a Bayesian formulation for the integration of multi omics datasets by combining within-view and between-view interactions. Although the paper is conceptually related to prior work, the reviewers appreciate the contributions made, which are both timely and relevant to the neurips community. Overall, this is a solid submission and the authors defend the concerns raised convincingly in their rebuttal.
BayReL: Bayesian Relational Learning for Multi-omics Data Integration
High-throughput molecular profiling technologies have produced high-dimensional multi-omics data, enabling systematic understanding of living systems at the genome scale. Studying molecular interactions across different data types helps reveal signal transduction mechanisms across different classes of molecules. In this paper, we develop a novel Bayesian representation learning method that infers the relational interactions across multi-omics data types. Our method, Bayesian Relational Learning (BayReL) for multi-omics data integration, takes advantage of a priori known relationships among the same class of molecules, modeled as a graph at each corresponding view, to learn view-specific latent variables as well as a multi-partite graph that encodes the interactions across views. Our experiments on several real-world datasets demonstrate enhanced performance of BayReL in inferring meaningful interactions compared to existing baselines.
Supervised Multiple Kernel Learning approaches for multi-omics data integration
Briscik, Mitja, Tazza, Gabriele, Dillies, Marie-Agnes, Vidács, László, Dejean, Sébastien
Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.We provide novel MKL approaches based on different kernel fusion strategies.To learn from the meta-kernel of input kernels, we adaptedunsupervised integration algorithms for supervised tasks with support vector machines.We also tested deep learning architectures for kernel fusion and classification.The results show that MKL-based models can compete with more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics genomic data. Our results offer a direction for bio-data mining research and further development of methods for heterogeneous data integration.
MoReL: Multi-omics Relational Learning
Hasanzadeh, Arman, Hajiramezanali, Ehsan, Duffield, Nick, Qian, Xiaoning
Multi-omics data analysis has the potential to discover hidden molecular interactions, revealing potential regulatory and/or signal transduction pathways for cellular processes of interest when studying life and disease systems. One of critical challenges when dealing with real-world multi-omics data is that they may manifest heterogeneous structures and data quality as often existing data may be collected from different subjects under different conditions for each type of omics data. We propose a novel deep Bayesian generative model to efficiently infer a multi-partite graph that encodes molecular interactions across such heterogeneous views, using a fused Gromov-Wasserstein (FGW) regularization between latent representations of corresponding views for integrative analysis. With such an optimal transport regularization in the deep Bayesian generative model, it not only allows incorporating view-specific side information, either with graph-structured or unstructured data in different views, but also increases the model flexibility with the distribution-based regularization. This allows efficient alignment of heterogeneous latent variable distributions to derive reliable interaction predictions compared to the existing point-based graph embedding methods. Our experiments on several real-world datasets demonstrate the enhanced performance of MoReL in inferring meaningful interactions compared to existing baselines. Multi-view learning tries to fully leverage the information from multiple sources (i.e. In biomedical applications, the shared embedding space also enables better understanding of the underlying biological mechanisms by discovering interactions between different types of molecules, which is our focus in this paper.